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Let {X n } be a discrete-time stationary moving- average process 
having the representation X n = £f AjY n -j where the real-valued 
process { Y„) has a well-defined entropy and spectrum. Letet 2 denote 
the smallest mean-squared error of any estimate of X n based on 
observations ofX n -i, X n -2, •-, X n -k, and letetJL be the corresponding 
least mean-squared error when the estimator is linear in the k 
observations. We establish an inequality of the form e* 2 > G(Y)eZL 
where G(Y) < 1 depends only on the entropy and spectrum of { Y„}. 
We also obtain explicit formulas for et 2 and etL and compare these 
quantities graphically when M = 2 and the {Y n } are i.i.d. variates 
with one of several different distributions. The best estimators are 
quite complicated but are frequently considerably better than the best 
linear ones. This extends a result ofM. Ranter. 

I. INTRODUCTION 

This paper is concerned with the problem of estimating the current 
value, X n , of a discrete-time stationary stochastic process given the k 
previous values, X n - U X n - 2 , • • • , X n -k. Denote such an estimator, or 
predictor, by 

X n = fk(X„-\, Xn-2, •••, Xn-k)- (1) 

We adopt the mean-squared error 

d - E[X n ~ Xnf (2) 

as a figure of merit for the estimator f k . Throughout the paper, we 
assume EX n — 0, n — 0, ±1, ±2, 

It is well known, and easy to show, that no estimator has smaller 
mean-squared error than 

fUXn-l, ■ • ^Xn-k) = E(X n \X n -u • • -,X n - k ), (3) 

the conditional expectation of X n given X n -u ■ • -,X n -k. We denote the 
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mean-squared error of this best estimator by 

ef m E[X n - ftf - E[X n - E(X n \ X n - U • • • , X n - k )f. (4) 

While the best estimator is simply described by (3), in practice it is 
frequently impossible to calculate it explicitly for processes of interest. 
The simpler class of linear estimators 



k 



Alin(X,-l, • ' • , Xn-k) = I CHjXn-j (5) 

1 

has been much studied in the past. 1 It is well known how to choose the 
c's to obtain the smallest mean-squared error within this class of 
predictors, and this least mean-squared error is given by the simple 
formula 

etL = D k+i /D k , (6) 

where Di is the / X / determinant whose entry in the ith row and yth 
column is 

Pi -j s E[Xi - EXi][Xj - EXj] (7) 

i t j = l, 2, • • •, /. The optimizing c's are also given by determinants 
involving the p<-> so that all quantities of interest for the optimal linear 
predictor are specified by the second-order statistics of the process and 
are generally easy to compute explicitly. For this reason, if for no 
other, the optimal linear estimator has been much studied and used in 
practice. 

How does e£ 2 compare with eftL? How much does nonlinear esti- 
mation buy? The answer, of course, depends on the process {X,}. On 
one hand, for Gaussian processes e| 2 = etL , so nothing is gained; on 
the other, one can construct processes for which etL/et 2 is arbitrarily 

large. 

In this paper we study predictors for some moving-average processes 

of the form 

X n = £ AjY n -J. (8) 

These processes are often used as models in applications. When the 
Y's are identically distributed independent random variables, the X 
process is sometimes called "filtered white noise." We establish for (8) 
a quite general bound of the form 

> G(Y)e*, 2 „, (9) 



e 



*2 



where the constant G(Y) < 1 depends only on the spectrum and 
entropy of the Y process and is independent of the A 's of eq. (8). When 
the Y's are independent identically distributed (i.i.d.) random vari- 
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ables, G(Y) is particularly simple to compute. Our bound generalizes 
a similar one found by Kanter. 2 

We are able to find ft and e* 2 explicitly for several special cases of 
(8). We treat in complete detail the case 

X n = AoY n + AlYn-l, (10) 

where the Y's are either i.i.d. with a uniform distribution on an interval 
or are i.i.d. with a one-sided exponential distribution. Curves are 
presented in these cases that compare e* 2 with e*i 2 n for various values 
of the parameters involved. It is interesting to note here that even for 
k = 1, ef < €kL for a wide range of parameter values. The explicit 
results are compared with the bounds already mentioned. 

Another special case of (10) is worked out in detail. Here A = 1, A\ 
= ±1 and the Y n are i.i.d. with the discrete distribution Pr[Y„ = 1] — 
p = 1 - Pr[Y n = -1]. 

We are also able to exhibit the best predictor when 

X n =la J Y n -j (11) 

y-o 

and the Y's are i.i.d. uniform or have a one-sided exponential distri- 
bution. The best predictors are surprisingly complicated here. We 
obtained an expression for the least error, e* 2 , but it is not included as 
it is apparently useless even for numerical calculation. 

Our results are presented in detail in the next section. Derivations, 
proofs, and further discussion of general theory are relegated to the 
succeeding sections. 

II. RESULTS 

Let 

X n =Y n -aY n - u EY n = 0, EYl - 1, (12) 

n = 0, ±1, ±2, • • • where the Y's are independent and identically 
distributed random variables. Then 

fl + a 2 , y = 

Pj = EXnXn-j = | -a, I j I = 1 (13) 

I 0, |y|>l 

and the determinant D h of (6) has value (1 - a 2( * +u )/(l - a 2 ) so that 
the figure of merit for the best linear predictor is 

fc2 __ l-a 2 * +4 f i, | a |<i 



€ * klfn -l-a 2k+ - 2 ^[a 2 ', \a\>l, (14) 

a result that does not depend on the distribution of Y„. From (14) we 
see that the behavior of the best linear predictor as k -+ oo depends 
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markedly on whether or not | a \ < 1. Since, from (12), Y n has unit 
variance and is independent of Y a -u X n -u X1-2, • • •, it follows that 
ef ^ var Y„ = 1 for k = 1. Thus, from (14) and the fact that always 
ef < e*i?n, we have 



eZL=l = & 2 , |a|<l. (15) 

When Y n in (12) has the uniform distribution with density 



Priy) - 



2? M *l 



0, \y\>y 

Y = 73, 
we find that 

E(X n \X n -l, • • ', X n -k) = ft(X n - U X n -2, ' • ', Xn-A d) 

= \ [max (-y, U h U 2 , •••,£/*) 



(16) 



where 



+ min(y, V u V 2 , ■ • • , V*)], a > 0, (17) 



U i =la j - l X n -j-a i y, 

7-1 



Vi = S a J - l X n -j + a'y, 
>-i 



(18) 



1=1,2, ...,A. 
When a = -6 < 0, 

E(X n I -Xn-1, • • • , A n -*) 

= tf (-X*-i, X„- 2 , -Xn- 3 , • • •, (-1)*X,-*; 6). (19) 
For all values of a, the mean-squared error of this best predictor is 



e? = 1 + 6a 2 du u{\ - u) R 1 - r - ?7 



(20) 



where (x) + = x if x > and is zero otherwise. This can be written in 
the alternative form 



€Jt a = 



l + 6a 2 * +2 P*(|a|), |a|<l 
1 



1 + 6a 2 P k 



a|>l 



(21) 
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where the polynomial P k {x) of degree in x is given explicitly by 



iWI 6 + A (/ + 2)(/ + 3) M 1-^ 



Another form, more suitable for computation, is 



PaM-Ztt 



6> 



o + 2)0 + 3)' 



6o = 1, 6/+i = 



** +1 - x J+i 
1 - x' +1 



(22) 



bj. (23) 



Figure 1 shows cf'/cSlL vs a for the case at hand. When | a | > 2, the 
best estimate of Xi based on just X n ~i has slightly smaller mean- 
squared error than the best linear estimator based on the infinite past. 
As is seen, as k increases, et 2 is smaller than eSL for a large range of 
a values. 

Figure 2 compares the best estimator based on k past samples of X 
with the best linear estimator based on the same samples. For \a\< 
1 and large k, the linear estimator does nearly as well as the best; but 
for a a 1.5 the nonlinear estimator (17) is significantly better. The 
curves shown approach the limit indicated as k — > oo. For a > 1.2 to 
the scale shown on Fig. 2, it coincides with the curve labeled k - 20. 
The bound (9) gives el 2 /£L > 6/?re = 0.703, a value about 14 percent 
less than the minimum shown on Fig. 2. 

As noted in (13), the process (12) always has variance EX'i = 1 + a 2 . 
Figure 3 shows the mean-squared error of the best predictor for a 
process of form (12) scaled to have unit variance. Again the Y's are 
i.i.d. with a uniform distribution. The limiting curve for k -> oo has the 
value 1/(1 + a 2 ) for | a \ < 1 and, to the scale shown, coincides with the 
curve labeled k ■■ 40 for a > 1. 



1.50 



1.25 



0.75 - 




0.50 



x n ■ Y n -aY n _,, Y'S UNIFORM 



Fig. 1— Comparison of ef 2 and e 2 * 2 withe*ii„ for X n = Y„ - aY n -t with the Y's i.i.d. 
uniform variates. 
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Fig. 2— Comparison of e* 2 with etL for X„ = Y„ - aY n -, with the Y's i.i.d. uniform 
variates. 
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X n = Y n - aY n _,, Y'S UNIFORM 



1 



9 10 



Fig. 8— «fV(l + a 2 ) vs a for X„ = Y„ - aY n -\ with the Y's i.i.d. uniform variates. 

It is instructive to examine (17) more closely to understand the 
nonlinear nature of the best estimator. Figure 4a shows ft (x; a) as a 
function of x for a > 1; Fig. 4b shows this quantity when < a < 1. 
Consider the case where a > 1. Now 

X n -1 = Yn-l ~ ClYn-2 
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Fig. 4— The best estimator /T (x; a) for the process X„ = Y„ - aY n -, with the Y's i.i.d. 
uniform variates. 



and each | Y | < y. If an observation of X n -\ has the value (1 + a)y, 
then we must have Y n _, = y and Y„- 2 = -y. Then X„ = Y„ - oy n -i 
= Y„ — ay. Since Y n has a symmetric distribution, the best estimate of 
X n is now its mean, —ay. If now, instead of observing X n -\ at the 
extreme value (1 + a)y, we observe a value x near the extreme, say 
where (a - l)y < x < (1 + a)y, we still obtain some information about 
Y n -\. In fact, one easily calculates for x in this range that 



Pr.-il^-iO'l*) = 



(l-ra)y-x' *-«V<^<y 



0, otherwise 

and that E[Y n -i \ X n -i = x] = x h\x -f- (1 - a)y]. Then, from (12), 
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E(Xn\X n -l = *) = E(Yn\X n -l = *) 

- aE{Yn-i\X n -i - *) - -| [* + (1 - o)y]. 

When \x\<(a- l)YiPr_iK,0'l*) = Pr n Jy) and knowledge of X n -i 
no longer gives information about Y n -\. The best estimate of X n is then 
its mean which is zero. The case | a \ < 1 can be discussed in a similar 
manner. When k, the number of past X values observed, is larger than 
1, this sort of analysis becomes difficult, however, and the intuitive 
understanding of (17) becomes obscure. 

Figures 5, 6, 7, and 8 give results similar to those of Figs. 2 and 3, but 
for the case in which the Y's of (12) are i.i.d. with density, 

,-(y+l) 



pv(y) = 



o, 



v>-l 
v<-l. 



For a > 0, we find 

f k * m - a [8 + max[-l, W,, W%»; W*]] 



(24) 
(25) 



where 



W; = £ a'-X-, - a', i- 1,2, -.,*, 

. o*(l - a) 
o = 



1 - a* +1 ' 

and the figure of merit for this best estimator is 

ef - 1 + a 2 5 2 . 



(26) 



(27) 
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X n = Y n -aY n _,, Y'S EXPONENTIAL 



k = 5- 



k = 20 = cx> 



10 



Fig. 5— Comparison of et 2 with cjfi?„ for a > for X„ - Y„ - aY„-\ with the Y's i.i.d. 
one-sided exponential variates. 
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Fig. 6— Comparison of et * with e$£, for a = -6 < for X„ = Y„ - aY„-, with the Y's 
i.i.d. one-sided exponential variates. 
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Fig. 7— eZ'Vd + a 2 ) vs a > for X n = Y„ - a y„-, with the Y's i.i.d. one-sided 
exponential variates. 



For a = -b < 0, the estimator is given by the more complicated 
expression 



ft = -a 



8 + 



Ae~ SA - Be' 



e -SA _ e - SB 



^ = max[-l, W 2 , W 4 , ••-, W ke ], 

B = mm[W u W :h •••, W*J, 

. k, k even , f k, k odd 

""^Jk-l.ifcodd ' Ko " Ik- 1, A even. 



(28) 
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Fig. 8_ c ; 2 /(i + a 2 ) vs a - -b < for X, = v „ - aY„-, with the 7*8 i.i.d. one-sided 
exponential variates. 

Its mean-squared error is given by 



tfib) = 1 + 



b k+ \\ + b) 
1 - i-b) 



k+\ 



1 - 2\(1 + A) £ 



n % (1 + n\y 



A = 



(& - l)(b k+i + l)/(b(b k - 1)), * - 2, 4, 6, 



6 - 1, b > 1 



--1,6<1 
o 



k - 1, 3, 5, • • 



6* 2 (1) = 1 + 



(A + D 



2' 



* - 1, 3, 



(29) 



It is seen from the curves that, for a > 0,et 2 (a) < et 2 (-a), at least 
for the graphs drawn. This raises the question of comparinge* 2 (a) and 
et 2 {-a) in general. It is easy to see (Appendix A) that these are equal 
if Y has a symmetric distribution. We show in Appendix B that 
ef(a, Y) < ef(-a, Y) for every Y if k - 1 and a - 1. However, the 
inequality is false in general for k = 1 if a ^ 1. The case k = 1, a = 1 
is thus special and the inequality is shown there to be the same as the 
fact that, if Y and Y\ are i.i.d. variates, then the average conditional 
variance of Y given Y - Yi is smaller than the average conditional 
variance of Y given Yo+ Y\. 

The bound (9) for the present case yields cS s /cSL ^ e/2-rr = 0.4326, 
a value 14 percent less than the minimum shown on Fig. 5. 

Consider now the case in which the Y's of (12) have the discrete 
distribution 

p r [y„ = X)=p, Pr[Y n = n] = q = l-p, \<ii. (30) 
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To satisfy EY n = and EYl = 1, we must have 



A = -y/q/p, 



H = wp/q. 



(31) 



We assume a ¥* 0. Each X„ can then take only four possible values: ju 
- afi,fi- a\, \-an,\- aX. If a^ ±1, these four values are distinct. 
In this case, observation of X n -\ allows one to deduce the value of Y n -\. 
The best estimate of X n is then a times this value of Y n -\, and this 
estimator has figure of merit €* 2 = 1, the least value possible. The best 
linear estimator still has variance given by (14) and, for | a \ > 1, this 
can be arbitrarily large. If a = ±1, however, the values of X H -\ no 
longer determine Y n -i, and the best estimator is more complicated. It 
is,f for a = 1, 



ff- 



and, for a = — 1, 



-{Z k + A), 

-(Z k + n), 

Ap* +1 + w k+i 
■ p k +i + g k+i 



at least one Z = /x — \ 
at least one Z = \ — /x 

allZ's = 



(32) 



A* = Z k + (-1) 



a 



Xp s+1 q' + q s+l pt 



K 



all Z's either 



or A + /i 



M. 



at least one Z either 

2A or fi — A 

at least one Z either 

2/x or A - n (33) 



where 



Z, = X a'-'X,-*-,,-,), i = 1, 2, • • ., k, 
j-\ 

s = number of positive even integers < k, 
t = number of positive odd integers < k. 
The figure of merit for these estimators is 



ct 2 = 1 + 



pV 



p* +1 + q k+l ' 



a = 1 



c *2_ t , f(PQ) k/2 , keven 

■ h - l + YMpq) { "- i)/2 , *odd 



a = -l. 



(34) 

(35) 
(36) 



t It turns out that the three alternatives in (32) and (33) are mutually exclusive and 
exhaustive, respectively. 
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The best linear estimator in this case for a = ±1 has mean-squared 
error 

1 



€Alin — 1 + 



k+ 1 



(37) 



Figure 9 shows the best estimator for this case. 

The case (12) just treated is indeed as general as (10) in which the 
i.i.d. Y's have finite variance a 2 and mean m. Uf/t(X n -u • • •, X n -k\ a) 
and cjf 2 (a) are for (12) the best estimator and its figure of merit, then 
the best estimator for X n of (10) is 

Aooff(Xn-u • ■ -,Xn-k\ -Ai/Ao) + m(A + A,) 

where Xj ■ [Xj - (A + Ai)m\/(A a) for ally, and the figure of merit 
for this best estimator is Alo 2 &*( -A \/A Q ). 

As for the inequality (9) mentioned in Section I, we show in Section 
VI that when X n is given by (8) 



e* 2 > £* 2 in 



1 e'" 1 



where H(Y) is the differential entropy of the process Y (defined in 
(156)), and S Y = exp{/o log S Y (f) df) is the geometric mean of the 
spectral density S Y (f), defined in (178). The heart of the result is 
Theorem 2, in Section 6, which relates the entropy H (X) to the entropy 
H(Y). 



** 1.3 - 




0.9 1.0 



Fig. 9— c* 2 for the discrete valued process X„ = Y„ - aY„-i where the Y's are i.i.d. 
variates taking two values Pr[ Y„ = -Vg/p] = p = 1 - Pr[ Y„ = Jp/q] = 1 - q. 
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III. GENERAL THEORY 

We consider the moving-average process 

X n = Y n + a x Y„-i + • • • + a M Y n -M = J] ajY„-j 



EY n = 0, EYl = 1, ao - 1, aj ■ 0, y . 



>>m 



><o 



n = 0, ±1, ± 2, 



(38) 



Let 



/?"> = 



1 fli a-2 
1 a, 







Ixl 



(39) 

be the / X / upper- triangular matrix with element a,-, in the ith row 
andy'th column, i,j = 1, 2, • • • , I. We adopt the notation 

Xi = (Xj, Xj-i, • • • , Xi) 

for the column vector whose components from top to bottom are Xj, 
Xj-iy • • • , Xj. Then from (38) we can write 



XJ!-* = R lk+u Y?,-„ + W ,n '* +1 



(40) 



where the components of the (k + 1) -column vector W ,n * +n are, from 
top to bottom, 



Wl"-* +1, = 2 Uk+l+x-iYn-k-U 

/-i 



i ~ 1, 2, • • • , k + 1. 

Now denote the inverse of R U) by 

s (/ » = i? (/ »- , = (s:j , ) /x/ . 

By multiplying (40) by S ( * +u , we find 

S a+u X2-k = Y„-k + s<* + »w ( "- Ar+n . 
The first component of (43) yields 

I Si* +,, X, +W = Y n + I S** +u W^ A ' +n 



j - 1 



/-i 



(41) 

(42) 
(43) 



n = 0, ±1, • • • , As = 1, 2, ••• . 
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Using (41), we obtain the useful result 



where 



I S^Sn+l-J^Yn + I A}" +1) Yn-k-J 

;=i /-J 

n = 0, ±1, • • • , k = 1, 2, 

A + l 



/ = 1, 2, ■ ■ • , M. 
From (44) we see that 

k+l M 

X n = Y„ — J] Sly -Xn+1-/ + 2j A/ Yn-k-jt 
y-2 /-I 

since clearly S$ +1) = 1. From (46) it follows that 
^*(XS=Jl-S(X„|XSii) 



(44) 

(45) 
(46) 



7-2 






= -2 sij +1) x, +w + s Al* +,) «(y.-*.i|xs=3l). (47) 



Before proceeding further with the calculation of the expectations 
on the right of (47), we comment on the special form of the matrix S U) 
Consider the quantities di, I =1,2, • • • defined by 



d, = (-1)' 



ax 


a> 


a 3 




• 


1 


a, 


a-i 


• 


• 





1 


a\ 




• 


. 




. 


ai 


a> 











1 


a x 



(48) 



where the determinant on the right has entries constant along diago- 
nals as indicated. Expanding the determinant by the elements of the 
first row, we see that 

di = — X o-jdi-j, do = 1, 

1= 1, 2, • • • . (49) 

We extend the definition of the rf's by dj ■ 0, j < 0. It follows easily 
then by direct matrix multiplication that 



S<" = 



1 di di 
1 di 







di-i 
di-2 



Ixl 



(50) 
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is indeed the inverse of R U) displayed in (39). Thus we have 

S\? = dj-i 

i,j =1,2, •••,/ (51) 

and the quantities A}*' of (44) are 

k 
Ai = X Oj+i-idk-j 

I - 1, 2, • • • , M. (52) 

The asymptotic behavior of the A\ k) as k — * oo is readily seen from 
this form. Since the d's satisfy the linear recurrence (49), 

M 

d, = l D t a\ (53) 

where the quantities a\ , ct2 , • • • , a« are the reciprocals of the roots 
(here assumed distinct) of the polynomial 

M 

Q(z) = I ajz J . (54) 

o 

In terms of these roots, (52) becomes 

A' M 

A\ kl = I aj+i-i I D,at~ J . 

7 = 1 1=1 

Now let k = M + p, p > 0. Since a, = for i > M, we have 

M M 

A IM +P) = £ a . + ii £ Aa M + p-,- 

7=1 (=1 

A/ A/ 

= I D.a? £ a M+l -j(x J -\ (55) 

.=! 7=1 

Since the inner sum here is independent of p, we see that 

|a,|<l, i= 1, 2, ...,M=>|A}*'| -+ 0, Z=l, 2, ...,M. (56) 

A— .00 

Now (46) shows X„ to be a linear combination of Y„ (which is 
independent of past X's), the expression 

A + l k 

Xk = — 2j 517 Xn + l-j ■— — 2j djXn-j (57) 

7=2 /-I 

which is linear in past X's (we have used (51)), and the random 
variable £? Aj* +1) Y,,-*-, = X. We have £X = 0, EX 2 = £f [Aj* +U ] 2 , 
and this quantity approaches zero with increasing k if the | a, | are all 
less than unity. Thus we see that in this case E(X n — Xk) 2 — > 1 

as k — » oo. 
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Theorem 1 : If the roots of 



M 



I a jZ J = 



y'-o 



are all of magnitude greater than unity, a best estimator based on 
the infinite past is the linear form 

00 

fUX n -X,X n -2, •••) = -! djXn-j 

7-1 

and 

€*J = 1 = 6*L. 

Note that the linear estimator X k of (57) is not the best linear estimator 
for any finite k. However, unlike the coefficients Ckj of the best linear 
estimator (5), the nonzero coefficients in (57) do not depend on k. 

We return now to the calculation of ft as given by (47). Rewrite (43) 
replacing n by n — 1 and k by k — 1 to obtain 

s m x n n z\ - yri + s<*>w ( "- 1, * ) 

or in component form 

£ SffX n -j= Y n -i+ £ Si*' 2 ak+l-jYn-k-l 

y-1 /-l /-l 

i = 1, 2, . . . , k 
or 

A Af 

Zk+l-i — 2j dj-iXn-j = ln-i + 2j "7 *n-k-ji 

7-1 y-i 

i- 1,2, ---,*, (58) 

the A's being given as before by (45). The triangular matrix S ik) 
connecting Z k , Z k +i , • • • , Z\ and the observed X n -\ , X n -2, ••• , X n -k via 
Z* = S'^XJJi* is nonsingular, so that in (47) we can now write 

EiYn-^lX^i) = EiYn-k-^Zl). (59) 

Now 

E(Y n - k -,\Z\) = \dy x ... I dyuyiPY'ztzmiyVW)' (60) 
But, by Bayes' rule for conditional probabilities, 
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Denote the density of Yj by 

PYj(y) =g(y)- (62) 

Since the Y's are i.i.d., 



M 

.M 



PY-:irf(yf) = Ugiyt). (63) 



i=i 



Furthermore, we see from (58) that given Y^I^lf = yf the Z's are 
independent random variables, so that 

Pzt | yj.-j-p (Z, | yf ) = n j/fc - 2 AjO . (64) 

Equations (59) to (64) combine to yield 

*(y.Hw|wi)-^i&, (65) 



//.(Zt= I rfy, ... I dy^y, 

M k / M \ 

n*<*) n^U- jam (66) 

/ 2 (Zf)= Uy,... J (ft* 2*0") 



n^^-J^AJ'O. (67) 

As we shall see, these M-fold integrals can be explicitly evaluated in 
certain special cases. 
For the figure of merit, we find 

6** 2 = E[(X n - ft)(X„ - ft)] = E[(X n - ft)Xn] 

-EXl-EXnft, (68) 

since the best predictor ft = E(X n \X^Z\) is uncorrected with the 
prediction error X„ - ft . For the two terms in (68), we have 

M 

EXl = I a] (69) 



from (38) and 

EXnfl = -I SfflEXnXn-j + 2 A\ k+{) E[X n E(Y n - k .,\X n n --\)] (70) 
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from (47). These terms can be developed further as 

* * M-J 

J=l S\ k j X\ ) EX n Xn-j= I dj I aiai+j 

/-l y-i i-o 

M f- 1 

= £ a v I a t d v -i 
on letting I + j = v. But from (49) 

X a/d„_/ - -a„, so if & > M, 



(71) 



M v-\ M 

</=£<*,£ "id*-' = -1 a *» k - M - 

v=l /=0 1 



(72) 



IfM> k, (71) becomes 



J=J]a^ a/d,-/+ I a, £ a,d„-, 

r _l /-o p-A+1 l-r-A 

A M-A A-l 



(73) 



p-l r-l /-0 

For the last term in (70), we find 



M 



E[XnE(Yn-k-l\X"n-k)] = E 



Combining these results, we have 



YajYn-j)E(Yn-k-t\M) 



M M 

€** 2 =1- lA\ k+1) I ajE[Y n -jE{Y n - k -i\Z\)l k>M, 

k M-j 

- S 4/ I °' a '+> 

y-o /-o 

-X Al k+ %E[Yn-jE(Yn-k-i\Z k i)l k<M. (74) 

IV. THE SPECIAL CASE M = 1 
When M = 1, we write eti = -a so that 

X n = Y n -aY„->. (75) 

From (48), d, = a', and from (52), A \ l) = -a 1 so that (58) now reads 

Z k+i -i = 2 a^Xn-j = Y„-, - a* +, -'y„-,A + n, 

i - 1, 2, • • • , k. (76) 
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Equations (66) and (67) are 



and 



/. = | dyyg(y) J[ g(Z, + a'y) (77) 



h - J dy g(y) n g(Z, + a'y). (78) 

For the best predictor, (47) now reads 

flHKEk) - -S a y "X-> - a* +I /i// 2 (79) 

/-! 

while (74) is 

ef = 1 - a* +2 E[y„- 1 5(y /I _ (A+1) |Z^)]. (80) 

As an aid to the evaluation of (77) and (78), suppose that g has 
support from < y = A to .y = /i>A so that g(y) = 0, y < A, y > fi. The 
integrands in (77) and (78) then vanish unless simultaneously A < v 
< [i and \ - Zi < a'y < n - Z,, I = I, 2, • • • , k. Thus, the integration in 
(77) and (78) can be restricted to the range 





l<y<m, 




I = max A, 

V « 


A-Z 2 

' a 2 ' 


\-Z k 


• / li-Zi 
m = mini ju, , 

V a 


li — Zi 

2 ' 

a 


P-Zk\ 


or, if a = —b < 0, 







a>0 

(81) 



l<y<m, 



T ( Zi-ix \-Zi Zs-n \ 

x ' k+i 

. / Z, -A /x-Z 2 Z 3 -A \ 

m = minU, — - — , fe2 , ft3 , •••! , (82) 



where there are A: + 1 quantities within the parentheses. Now use (76) 
and the notational aid 

1j-Y*-h 7=1,2,.... (83) 
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Elementary operations show that 
/ = Y h+ \ + A max [« y_1 (* ~ t)] 

a y-l.....*+l 

m = Y k +i + —£ min [a y_ '(M - ?/)] 
a y-i *+i 



/ = Y k+ i + TJ 



tti = Yk+l + T* 



1 

• max[&*(A - ft*,), 6*" l (f* - n), 6*~ 2 (A - ?a-i), • • • ]* + i 
1 

.min[6*(iu - f* + i), 6*- ! (f* - A), 6*- 2 (/x - ft-,), . • . ]»+,. (84) 

Since A — ? < and /i — i^ > 0, we see that / < m and T<rh with 
probability one. 

4. 7 K's are uniform 

Let Y, have the density 

2y' b|<y 
0, M > Y 

y s VS. (85) 

Now A = —y, fi = y, 



Pviy) =g(y) = 






= |-1 (,„ />, 



and from (79) 



while (80) is 



k a k + i 



ft(KZ\) = -I a j - l X n -j --z-d + m), (86) 

i « 



k+2 



It is a matter of straightforward algebra to put (86), (81), and (76) in 
the form (17) to (18). We omit the details here. 
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The evaluation of (87) can proceed as follows. Since ?*+i = Yn-<*+n 
and y„-i = Yi are independent, from (84) 

l /2a k E[Yn-i{l + m)] = KE[t* max (a /_, (-v - ?,))] 

y'-l.--.,A + l 

+ fc£[fi min (a 7 -'( Y - ?/))] 

y=i,...,*+i 

= 6E[(l-28o) min (5,a y )]. (88) 

j-0,...,k 

Here we have set % = y(l - 2fiy-i),y" = 1, 2, • • • , k + 1, so that the 
fi's are i.i.d. variates uniform on (0, 1). Then from (87) and (88), 

c* 2 = 1 - 6a 2 E(l - 25)min(6, 8) (89) 

where 8 = So and 

5 = min (8ja J ). (90) 

Note that, since 5 and 8 are independent, 
E(l - 25)min(5, 5) 



f 



- (1 - 2x)xPr(8 edx,8> x) 



1 rx 



JO 



+ (1 - 2x)yPr(8 e dx, 8 e dy) 



f 



= |(1- 2x)xPr(6 > x) d* 

+ 1 J (l-2^)^^-[-Pr(>'<S<a:)]c^dx. (91) 

Integrate by parts in the integral on dy. The boundary terms vanish 
and so 

E(\ - 25)min(5, 8) = | (1 - 2x)*Pr(5 > x) dx 

Jo 

+ {I - 2x)¥r(y <l < x) dy dx 

Jo Jo 

= (I - 2x)xPr(8 > x) dx 
Jo 
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+ | I (1 - 2x) 

Jo Jo 

.[Pr(S > y) - Pr(S > x)] dy dx 

(1 - 2x)Pr(8 > y) dy dx 

'o Jo 

J (1 - 2x)Pr(8 > y) dx dy 

'0 Jy 

= - y(l - y)Pr(S > y) dy. 



(92) 



Since from (90), 



Pr(5>3')=n (l-^) (■»•■') 



(20) follows immediately from (89) and (92). 
Equation (21) follows from (20) by setting 



•1 * 

P k (x) ■ I du u{\ - u) J] (1 - «*')• (94) 



1-1 



Now let 

#*(*, u) - n (i - «*'") = S «!/"'■ < 95 > 

1=1 >-o 

where the 6's depend on k and a:. Substitution in (94) yields the first 
part of (23) since Jo du u{\ - u)u l = \/{l + 2){l + 3). However, one 
sees from the product form for R k in (95) that 

(1 - ux)R k (x, ux) = (1 - ux k+l )R k (x, u) 
so that 

(1 - ux) E bj{ux) j = (1 - ux k+i ) £ bju J . 



Equating the coefficient of u' on both sides of this equation yields 
6o=l, b,(l - x') = (x k - x')b,- u 1=1,2, ...,k 

from which (22) and (23) then follow directly. 

388 THE BELL SYSTEM TECHNICAL JOURNAL, MARCH 1980 



4.2 Y's are exponential 
Let Yj have the density 

Pv(y) = 



,-(.v+l> 



.0, 



y<-l. 



(96) 



Now A = — 1 and /i = oo so that (81) is 

1+Zi 1 + Z 2 



I = —mini 1, 



1 + Zi 



m = 00 



while 



/ = —mini 1, 



a a 



1+Z 2 1+Z 4 



.* / » 



a > (97) 



,2 > 1.4 



b 4 ' 



. / 1 + Z, 1 + Z 3 1 + Z 5 

m = mm 



6 ' b 3 ' b 5 
From (65), (77), and (78), we then find 

E(Yn-(k+l)\X"-k) = Ii/h 



a = -b<0 (98) 



1 me' 8 * - le-™ 
B e ~ B ™ - e~ m ' 

B=l + a + a 2 + ... + a k = 



a>0 

a = -b<0 

1 - a k+1 
1-a ' 



(99) 



Using (97) to (99), it is a matter of straightforward algebra to convert 
(79) into the results stated as (25), (26), and (28). Along the way, use 
must be made of (76) which defines the Z's in terms of the observed 
X's. We omit the details here. 

The evaluation of c| 2 is somewhat more complicated. Equations 
(80), (59), (83), and (99) give 

ejf 2 = 1 - af^Efrh/h) (100) 

and from (99) it is seen that the cases a > and a < must be treated 
separately. 

When a>0,I i /I 2 = 1+ (1/B). Then (100) becomes 



o; 



t 2 = 1 - a k+ ~E 



y. (i + ? A+ , + X max [a j -\-l - Yj)]) ] 
\r> a /-1 *+i J 
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= 1 + a 2 E(0i - l)min[0i, a8 2 , • • •, a% + i], (101) 

Bjmi + tj, J- 1,2, .-.,* + 1. (102) 

Here we have used (84) to express / in terms of the Y's and have used 
the fact that E?i = and that the Ts are independent. Now define 

U = min(o0 2 , a 2 3 , • • •, a k d k+i ). 
Since each $ has the one-sided exponential distribution 

[e~\ 0>O 

P*S ) = 

.0, *<0 

one readily finds that the density for U is 



(103) 



pu(u) = 1 



- 8 e~ u/ \ u>0 



0, u < 

with S as in (26). Furthermore, U and 0i are independent. The calcu- 
lation of (101) then reads 

et 2 = 1 + a 2 E(0i - l)min(0i, U) 



= 1 + a 2 dd(0 - 1) d" up v (u)p Bi 



(d) 



J d0(0 - 1)0 [ 

Jo Je 



du pu(u)p $1 (0) 



The integrals are readily evaluated and (27) results. 

The computation of et 2 is more burdensome when a = -6 < 0. From 
(99) and (100), 



ej? 2 = 1 - (-l) k b k+2 E\ ?i 



me — te 

e -Bm _ e -Bl 



where from (84) 



T= t *+, + -JT max[-6*^*+i, -b k - 2 6 k -u • • •] 



(104) 



m 



= f k+1 + -, mintft*- 1 ^, b k -%-2, • • •] 
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and we have again used (102). Now write 

B=B/b k , i=? k+ i-b- k U, m = t k+1 + b-"V. 
Now (104) becomes 

€ f = i - (-l)W*(ft - 1) Ve __ +Ue , ( io5) 

tf = min[b% +1 , b k - 2 k - u ...], 

P- mintft*- 1 ^, 6*- 3 &_ 2 , . . •], (106) 

where 0\, 2 , •••, Bk+\ are i.i.d. random variables having the density 
(103). 
Suppose now that k = 21 is even. We have 

U = min(0,,Z) 

Z = mm(b%,b%, '--,b 2 %i +l ) 

V =min(bd 2 , b%, ...,6 2 '- 1 tf 2 /) 

and 



. > \B e e~ B ' z , 2>0 „ 1 1 

PZ{Z) = \0, z<0> B < = b> + b< + '" + 



1.1 1 



, , fB e- B ° v , v>0 „ 1 1 1 



Since 0] , Z, and F are independent random variables, 
et 2 =l-b 2 J dvpv(v) 



no-* + /fe fl * 

dd p ei (d)(d - 1) \ dzp z (z) _ ^ 

e -Bv _ e BB 



[[ 



+ \ de P8) (d)(d-i) f dz Pz (z) ve Bv + yet 

Jo Jo e~ Bv - e Bz 



Inside the brackets here, the z integration can be carried out immedi- 
ately in the first term. Interchange order of integration of B and z in 
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the second term and carry out the 6 integration. There results 

ef=l- b 2 Bo \ dv\ dS e - m e- B ^ +v) ^-_ -[(B e + 1)0 - 1]. 

Jo Jo «-*<*• _i 

Now change variables of integration to x and v via v + = x, — v. 
Tedious, but straightforward integration now gives 



where 



Now 



2 | Bo + S 2 



g ,*yaj"^ 



e -& 



(107) 



(108) 



from (991. Since k is even in the present case, S >_0 and the factor 
[! _ e ~ Bx ]~ l in (108) can be expanded as £o e" nflx . Term-by-term 
integration, insertion in (107), and a little rearranging yield (29) for k 
even. 

The case of odd k proceeds in a similar manner. Now B as given by 
(109) will be negative if b > 1. An integral of the form (108) that occurs 
in the calculation must now be expanded in two different ways de- 
pending on whether S is positive or negative. This gives rise to the 
several forms for X in (29). We omit the straghtforward details of 
calculation here. 

4.3 Y 's are discrete binary 
Now let Y take only two values: 

Pr[Y = \]=p, Pr[Y = /i] = 9 s l-P 



>--4»-4 



so that as always EY = 0, EY 2 = 1. When k = 1 and a, = -1, the 
di = l t A[ n = -1 and (38), (47), and (58) become 

Ji.n = % n * n— 1 

ft = -Z k -E(Yn-ik+i)\Z u ..-,Z*) 

Zh — Y n -1 -~ Y n -(k+l) 

z, = Yn-k - y«-(*+i). (no) 
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Here each Z can take only values 0, \i — A, X — \i. One readily finds 
Pr[Y„- (A+ i> = A|Z?] = 



some Z < 
some Z > 



,A+1 



Pr[Y„-(* + i, = |u|Z?] = 



p* +1 + q k+l ' 

0, 
1, 



p k+x + g k + i . 



all Z'sO 

some Z > 
some Z < 

all Z'sO 



so that 



E(Y n -(k+u\ Zi) — 



A, 

Ap* +1 + /xg* +1 
p k+i +q k+l ' 



at least one Z > 
at least one Z < 

all Z's zero. 



(Ill) 



Equation (32) follows then from (110) and (111). 
For the figure of merit of the best predictor, we have in this case 

ef = 2 - EX n ft 

= 2-£{[Y„- Y„_,] 

'[—Y„-\ + Y n ~(k+n — E(Y n -(k+\)\ Zi)]} 



~l-E[Yn-lE(Y n -ik+M)\ 



(112) 



Now 



EY„-iE{Y n -(k+\)\ Zi) =A 



Ap* +I + iiq k+x 
/A" + 9* +I 

• Pr[Y„-, = A, all Z's = 0] 

Ap* +, +/t 9 * +l 



+ /i 



p* +1 + q k+i 



.Pr[Y„_,=/i, all Z's, = 0] 
+ A 2 Pr[ Y„_i = A, some Z > 0] 
+ A/iPr[ Y„_i = ju, some Z > 0] 
+ A/iPr[ Y„_i = A, some Z < 0] 
+ J u 2 Pr[Y„-, =/x, some Z<0]. (113) 
The six probabilities Pr[ ] listed here are readily seen to bep* +l , q k+ \ 
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p 2 [l -p* +1 ], qp,pq, and q 2 [l - q k+l ], respectively. Equation (35) then 
follows from (112) and (113) by simple algebra. 
When a = — 1, the corresponding equations are 

X n = Y„ + Yn-1 
Z/, = Y„-\ — ( — 1) Yn-(*+l> 
Z*-l = Yn-2 — (~ 1) Y/i-IA+1) 



Zi = Yn-A — ( — 1) Y n -(k+l) 

/" A *=Z*-(-l)* +, £(Y n _ ( * + »|Z?) 

e| 2 = 1 - (-l)*Jg;[Y„-,^(Y„-(*+i)|Zfi]. 

Note that the Z's with odd subscript can take values (A + it), 2A, or 
2/t while the Z's with even subscript can take values A — it, or /t — A. 
Let there be s Z's with even subscripts and t Z's with odd subscripts. 
Then 

Pr[Y„_(* + i) = A|Zf] 



p s+ W 



p s+i q l + q s+ y 



1, 



Pr[Y„- ( * + i, = /i|Zf] 



<ry 



pS+ y + 9 «.y 



i, 



all Z's = (A + /i) or zero 



some Z = 2A or it — A, 



all Z's = A + (i or 



some Z = 2/t or A — it 



and it follows that 


E(Y n -(k+l)\ Zi) 




Ap s+ Y + M9 s+1 />' 




pS+ i gt + gS+ i pt > 




A, 




M» 



all Z's = A + it or zero 

some Z = 2A or /i — A 
some Z = 2/i or A — it. 

Equation (33) then follows at once. The computation of cjf 2 is now a 
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little more complicated in that the cases k even and k odd must be 
treated separately. We list the key equation in the computation. 

*ty fiy l7 * n _ v + v+/"ry 

A[ Y n -iE ( Yn-{k+l) | £> i) J - +1 , - +1 t 






+ x .|Jtt-rtn 



^ Wd - <ry> + (i -y-v>] 



<z 2 

Here the upper choice corresponds to k even and the lower choice to 
k odd. Straightforward manipulations now lead to (35) and (36). We 
omit the tedious details here. 

V. THE CASE a, = a', j = 1, 2, • • ., M 
We now consider the exponential filter 

M 

X n = £ a J Y n -j, M>\. (114) 

>-0 

For the parameters of Section III, we have a,- = a\j = 0, 1, • • • , M, and 
all other aj are zero. Equations (49) or (48) then yield 

rf>(Af+l) + l+s = 

J - 0, 1, 2, • • • 

s-1,2, ...,JI#-1. (115) 

From (52) we then find 









7 = 0,1,2,... 

/=1,2,...,M 

s= 1,2, .... M. (116) 
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As before, we suppose that the density PY,(y) = g(y) has support on 
A < y < fi. To evaluate (66) and (67) it is necessary to determine the 
M-dimensional region of support for the integrands. The hyperplane- 
boundary constraints on ^1 ••• yit are 

A<^</x, i-1,2, ...,M (117) 

M 

\<Zi- I Aj"j,<ju (118) 

y'-i 

1-1,2,...,*. 

Now let 

k=p(M+ 1) + l + s (119) 

for some p = 0, 1, 2, • • • and some integer s such that < s < Af. In 
view of (116), the constraints (118) on y x ••• jm become 

M 

A < Z a w+ i )+ i - a o{M+1) I a'y/ < ju, a = 0, 1, -.., p 

A < Z o( m + i, + i + , + a |o+1HM+ Vw ^ M 

a = 0, 1, -.., p-1, j=l, 2, ...,M 

A < Zp^+D+i+y + a ( " +l),M+ V+w < M 

7=1,2, ...,s. 

Notice that most of the hyperplane boundaries are parallel to the 
coordinate planes and that the remaining ones are all parallel. This is 
because of the simple exponential form of the filter (114). Thus we 
find the support for the integrands of (66) and (67) to be given by 

lj < yj < my, j - 1, 2, • • •, M 

M 

A<Za J Yj<B (120) 

i 



where 



A = max 

o-0,l p 



Z„(M+i)+i — M 



B = min 

o-O.l, ..-,/> 



lj = max 



a o(A/+l) 



(Af+1)+1 — A 
a "(M+\) 

A — Z(M+\)+\-j A — Z2(M+\)+l-j 
*» ^AfH ' a 2<J\f+l) 
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m, = mm 



A — Zt(M+\)+\-j 



(1 — Z(M+\)+l-j fi — Z2(M+\)+\-j 



M+l 



2(Af+l) » 



/* — Zt(M+\)+l-j 



J(M+U 



, IA y= 1, 2, ...,M-s 

\p + l, j = M-s+ 1, ...,M. 



(121) 



The quantities A, £, /,, m>, fory = 1, • • •, M are random variables, 
and via (58) can be expressed in terms of the Y's. Equations (39) read 
in the present case 

a(M+l)+l-j — Xo(M+l)-j ~ Ot I-j 

y = l,2, ...,M, a -1,2, ..., p 

Z— \^ „,(p+D(Af+l)-\? 

(p+i)(jw+i)+i-> — x (p+iMAf+n-v ~ a J-y 

y' = Af-s+l, M-s + 2, ...,M 

ZaCM+lHl - 2 > o(A f + l» + a o(A/+1) J] flfY-| 

/-l 



a = 0, 1, ...,p 

where for notational convenience we have put 

Y u = Y n -k + u, u = 0, ±1, ±2, • • • . 

Equations (121) now become 

Y a (M+D — H 



(122) 



(123) 



M 



A = X a'iw + max 

/-l o-0,l p 



M 



B = £ a 1 ?-/ + min 

/-l o-0.1,...,p 




/> = It-j + max 

m y = Y-j + min 



v- A ~ Y o(M +\)-j 
A " *-J> a(A/+D — » a = l, z, 



^ jl—YoiM+D-j 

/*-'->» -^jittj ; a=l, 2, •••,* 
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[p, 7-1,2, ..-, M-s 

jp + 1, j = M-s+l,M-s + 2, ...,M. 



(124) 



5.1 Y's exponential 

Let the Y's be i.i.d. with density (96). Now A = -1, ju = oo. Then (67) 
can be written 



h = c \ dyi • • • dy M 



e ' e '-"" 



I*'** (125) 

In will be given by a similar expression with an extra factor of v/ in the 
integrand. 
It is convenient now to set xj = ( y } ■ — lj)ot j to obtain 



h = c | dxi • • • | c?xm e ' 



= c I rfxi • • • 
Jo Jo 



where 



and 



£*,<£ (126) 



8 - B - X arty (127) 



ft-a-'[!/f-l 
[1 — a 



1 _ a (P+DW+» a -> 

l-or M+I l-a"" 



_ a (p + D(Af + l) ^ y = 1, 2, • • • , M - S 

( P+ 2)(Af + i) ^ y = M - s + 1, • • • , M. (128) 

But the integral in (126) can be easily evaluated. Denote h/c by Jm(P\, 
-•',Pm\8). Then integration on x M yields the recurrence 

. 1 
Jm (Pi , "•, Pm\ B ) — -fi- 

PM 

•[e*"»JM-i(Pi ~ Pm, fc - fa, • • •, j8*_, - Pm\ &) 
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with Ji(fii; B) = [f** - lyfii. The solution is 



Jmifil, • • ; 0m; &) = (-1) 



M 



1 M e B h 

1 UPj PjIl(Pk-PjU 



(129) 



as can be shown by induction. 

As noted, In differs from (125) by a factor y t in the integrand, or 
from (126) by a factor Xi/a l in the integrand. From this, it is seen that 
In = a~'(dJ M /dPi) so that from (65) 

E(Yn-k-l I XS3) = -t ^r log JMPu ft, •.. , p M \ B) (130) 
a dpi 

where Ja* is given by (129) and the other parameters by (127) and 
(128). Expression (130) must be inserted into (47) to obtain the 
complete estimator. From (115) and (116) we see that the form of the 
best estimator depends on s defined in (119). Thus 

ft(X n n z\) = -I a j{M+1) (X n - J{M+ » - «X„-vw + i)-i) 

r a ip+iM+» £ a'EiY^-tlXZH), s = M 

+ [ -a {p + 1){M+x) E{Y n -k -(M-s) |XS=1), s = 0, 1, . . ., M - 1 (131) 

where the first sum is vacuous if p = 0. 

When s = M, the last sum in (131) can be carried out. We emphasize 
how complicated the best estimator is for this relatively simple case 
by writing it out in full: 

k = v(M + 1) 

ftOSb = -2 <* J{M+1) (X n -j {M+1) - aXn-JM+V-l) 
y'-i 

+ a' iM+l) B + (-1) M H M /J M , 

tf + yl 

L o M Bf>, 



Bm~- .nA + 



n Pi & pi n (fi» - Pi) ' 



k+i 



1 - a v{M+l) - a- j a - a ( " +1)(M+1) ) 



A- ; iBi • (132) 



Here Jm is given by (129) and the observable B is given by (127) with 
B and the lj being given by (121). Finally, from (58) the Z's of these 
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equations are given in terms of the observable X's by 

Zi = I djXn-k-u+x-i), i = 1> 2, • • • , k (133) 

7-0 

with the d's as in (115). 

The expression forejf 2 can be reduced to a two-dimensional integral, 
but there seems to be little point in exhibiting this exceedingly com- 
plicated expression. 

5.2 Y's uniform 

Explicit formulas for the best estimator ft can be worked out when 
the Y's are i.i.d. with the density (85). Again, the results are exceedingly 
complicated. 

We now have A = -ju - y = V3 and (67) can be written 



J-m x rm u 

dyx • • • dy M 



A < J a'yi < B (134) 

with the Z's, m's, A and B given by (121). The quantity In of (66) is 
similar to (134) except for a factor of yi in the integrand. 
To evaluate (134), let 

a j yj = xj, a j mj = ihj, a J lj = tj 
so that 

J 2 - c" I dxi • • • I dx M . 



eta — 



A <£*,<£. (135) 

We can now interpret h/c" Iff (m, - /,•) as Pr[A < £f T, < B] where 
the T's are independent random variables and T, is uniformly distrib- 
uted between /, and m„ i = 1, 2, • • • , Af. The characteristic function of 
the random variable S = £ T, is 

so that the density for S is 



ps(s) = f dfe- Uf ® s (f) 



•I'i 



* «-* n f — r 1 - 
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f e' 

jk L 



Wj k -8) 

37 — » 



r 

K=l/(2m M l[(mj-tj)). (136) 

Here ^ is a contour in the complex plane that runs along the real axis 
except for a semicircular excursion into the negative imaginary half- 
plane along a circle with center at the origin. The 2 M quantities T jk 
arise from multiplying out the product and are given by 

r Jk - U x + t + • • • + Uj + m ly+/ + • . . m iu (137) 

where i h i 2 , • • • , iu is a permutation of the integers 1, 2, • • • , M and i u 
it, • • • , i, are chosen in all (f) ways. Thus 

A = l, 2,. ..,(?) 

/ - 0, 1, . . . , M. (138) 

Now 

2m M 



e ifu 



J 



,M-\ 



k , u > 



(M + 1)! 

L o, u < o 

as can be readily established by contour integration, so that 

CO 

pM - cif-D»io»-t) l ( - i)y £ i» - ****** < 139 > 

Thus since Fr[A < S < £] = J a Ps(s) ds we find finally 


'2 = ^ S (-D 7 X {[(Fy* - A)*? - [(Tjk - B) + ] M ) . (140) 

For the numerator in (65), we have 

Ivi/c" II (m - /,-) =1 dt \ ds tp T ,s(t> s > ( 141 > 

with S and the T's as before. Now 

PT^(t, S) = PT,(t)p S \T,(s I t) 

= PT,(t)ps,(s - t) (142) 



where 
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is independent of T„. By the same steps that led to (139) 

(T) 

Ps.(s) = (M * Er S'c-D' S [(Tjf-^T" 8 d43) 

(M - 2)! II (my - (,) y_o ._j 

where the rjj? are formed in a manner analogous to Ij», only /„ and m„ 
are omitted from (137). Inserting (143) and (142) into (141), one finds 
finally 

(M + 1)! y-o •_, 

. [(M + l)m, - (Tfr - A + m„) + ] 

- [(r;* - a + / r ) + ] M [(jif + 1)/, - (r;* - a + u + ] 

- [(TJ* - B + m,) + ] M [(M + l)m, - (Y) k -B + m,) + ] 

+ [(r;* - b + w*rtdf + M - (H* - b + *,)*]. <M4) 

The ratio of (141) to (140), which is independent of c", is 
E( Y n -k- r | X"Ia). The best estimator is obtained by using this quantity 
in (131). 

VI. ENTROPY INEQUALITY 

We begin by giving some definitions and stating some facts concern- 
ing the Shannon differential entropy of a random variable. Let U be a 
real-valued random variable with probability density function pu{u), 
-oo < u < oo. The (differential) entropy of U is 



= - I pu(u 



H(U) = -\ pu(u)\og pu(u) du. (145) 

Intuitively, H(U) can be thought of as a measure of the spread of the 
density function pu(-). The following facts are easily verifiable (see, 
for example, Refs. 3 and 4). 

(a) H(U) can take any value in [-oo, +»]. However, for s > 0, 

mm*!* **™*?™ . im 

s s 

with equality when pv(u) is of the form Kiexp{-K 2 \u\ 8 }, -oo < u < 
oo. The constant K 2 is a parameter and Ki is chosen so that / pu(u) du 
= 1. Thus E | U\ s < oo, for some s > 0, implies H{U) < oo. 
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(6) For constant a, -oo < a < oo, 

H(U) = H(U+a). (147) 

Inequality (146), for s - 2, and (147) imply that 

H(U) <-log[>e(var U)], 

with equality when U is Gaussian. 
(c) For —oo < a < oo, 

i/(af/) = log|a| + #(£/). (148) 

Next assume that U and V are real-valued random variables with 
joint probability density puviu, v), and marginal densitiespy(w),/Mi>), 
respectively, -oo < u , v< oo. The conditional entropy of U given V is 
usually defined as 



H ( £/ 1 V) = - I jMu) efo | I pi/| v(w | u)log p V | v (m | v) du \ , (149) 

where p V \v{u \ v) = puv(u, v)/pv(v) (v/henpviv) > 0) is the conditional 
density for U given V. Of course, we can write 

H(U\ V) = J pviv)H(U\ V-v) dv, 

where H{U\ V- v) is the term in brackets in (149). Furthermore, the 
Shannon information between £7 and V\& often taken as 

/(£/;V)-IT(tf)-fltt/|V). (150) 

In this section we need to define H(U\ V) for a general random 
quantity V— in particular, an infinite sequence of random variables. 
The simplest way to eliminate detailed mathematical technicalities is 
to exploit the fact that the information I(U\ V) for abstract random 
quantities U, V has been carefully defined, and many of its properties 
established in the literature. 5 - 6 We then define H(U\ V) in terms of 
I(U; V) and H(U) using (150). Thus, for U a real-valued random 
variable such that H(U) < oo, and Fan arbitrary random quantity, the 
conditional {differential) entropy of U given V is defined as 

H(U\V)AH(U)-I(U;V). (151) 

Well-known properties of the Shannon information 4,5 can now be used 
to verify these additional facts: 

(d) H(U\ V) = H(U), U, ^independent, (152) 
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(e) H(U\ V) > H(U\ V, W), (153) 

(/) H(U + f(V) | V) = H(U\ V), (154) 

(g) H(U\V,f(V)) = H(U\V), (155) 

where Uis any random variable with H(U) < co, V, Ware any random 
quantities, and / is any measurable function. 

Finally, let £/={£/„}"=.- «, be a stationary real- valued random se- 
quence. The entropy of the stationary sequence U is defined byt 

H(U) AH(U n \ U n -~J) = H(U n | U~ +i ). (156) 

We will be concerned with the entropy of a certain family of random 
sequences defined as follows. Let Y = { Y„} be a real- valued stationary 
random sequence such that E\Y„\ S < oo, for some s > 0. We are 
interested in the random sequence X = {X,} defined by 

M 

X n = I a m Y n -m, "» < 7l< 00, (157) 

m-0 

where M < oo and for convenience ao = 1. Since 2? | Y„ |", i£ | Xi |" < °°, 
the entropies H(Y) = H(Y«| Y^- 1 ) and H(X) = ^(X n |X^') are 
meaningful. Our first task is to establish a relation between H(X) and 
H{Y). 

The polynomial 

M M 

Q(z) A X a m z m = n (1 " ajz) (158) 

m=0 y'-l 

is associated with the process X. The {aJ l }jL\ are the M, perhaps 
complex and/or repeated roots of Q(z). 

The main result of this section is a theorem which relates the 
entropy H(X) of the stationary sequence X to the entropy H( Y) of the 
sequence Y. After giving the proof, we show how to apply it to obtain 
a bound on the prediction error. 



f The second equality of (156) follows from 

H(Un I f/'-x 1 ) = lim H(U„ I UT-n), and (see Ref. 5) 

W—oo 

I(U„; U"„Z l N ) = KU„; U„-i) + I(U n ; U*+\Um- t ) 

+ •■■+/([/„; U„- N \U n „--' N+i ) 

= I(U„; U„+,)+I(U„\ EkrtlK+i) 

+ ... +I(U„■,U„+ N \W l Xr , ) 

= ku„; w:f>. 
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Theorem 2: 
(a) 

H(X)>H(Y) + £ log | «, | 

A H(Y)+ log A. (159) 

(6) IfYis ergodic and E\Y n \ < co, or if \aj\ ^ 1, 1 < j < M, tfien 
(159) Ao/rfs a;i#i equality. 

(c) We exhibit a nonergodic Y for which (159) holds with strict 
inequality. 
Remark: A straightforward integration yields 

1 M r\ 

Qmf\ 



log| Q(e?") \df=1 log| 1 - aje i2 " f \ df 
o >"' Jo 

= 2 log|a,| = logA. (160a) 

y:|„,|>l 

Kanter 2 showed that when {Y„} are i.i.d., 

H(X) >H(Y)+ ) log | Q(e' 2 "0 | df. (160b) 

Jo 

Of course, when the {Y„} are i.i.d., Theorem 2(6) implies that (160b) 

holds with equality.f 

Proof: Let us factor Q(z) into Q(z) = Qi(z)Q2(z), where 

M x Aft 

QAz) = n (1 - Pjz) = X 6m2 m , (161a) 

j— 1 m— 

Q 2 (2) = if (1 - Y>2) = J C m 2 m , (161b) 

y-i m-0 

where |#| > 1, 1 <y < M h and |yy| < 1, 1 S y < M 2 . Thus Qi(z) 
corresponds to the roots of Q(z) inside and on the unit circle, and (JMz) 
to the roots of Q(z) outside the unit circle. The {/?,} {v,} may be 
complex, but the {b m } and {c m } are real. Of course, 

logA= I log|«>|= 2 log|A| 

>:|a,|>l y-1 

= logn \Pj\=log\b Mi \. (162) 

y-1 



t Let us remark that Shannon, in his classic paper ("A Mathematical Theory of 
Communication," B.S.T. J., 27 (1948)), stated that (160b) always holds with equality, 
and gave an intuitive justification for this. We now know, however, that the equality wul 
hold only if conditions, such as those in part (6), also hold. 
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Next define the delay operator D. Let u = {u n }n— . be a sequence. 
Then (Du) n = ife-i. Thus X can be written as X = Q(D)Y - 
QAD)Q 2 (D)Y. Let W = Q 2 (D)Y, i.e., 

W„ = J c m Y n - m , co = 1. (163) 

Then 

iW) = #(W»I wst) §■ if(w„| w^ 1 , y^; 1 ) 
- #(Y„ + f c m y„_ w | unff, r^ 1 ) 

m-l 

( = /f(y n I ws 1 , y^ 1 ) ( = /r< y« | Y n --J) - if(y). (164) 



Step (1) follows from (153), step (2) from (154), and step (3) from (155), 
since W n sJ can be calculated from Y n -~J using (163). 

We now relate H(X) to H(W) using the relation X = Qi(D) W, i.e., 






Write 

H(X) = HiXnlXZ+i) > H(Xi|Xh-1, Wn-M l+ l) 

\m-0 / 

( = H{b Mi W n - Mi \X^ U WZ-M i+ l) 

(3) 

= H(b Ml Wn- Mi \Wn-M & l) 

( = log| b Ml | + H(W n - Ml | WT-M 1+ i) 

(5) 
= log| b Ml I + #( W) - log A + i/( W). (165) 

Step (1) follows from (153), step (2) from (154), step (3) from (155) and 
the fact thatX"+i can be calculated from W"-m 1+ i, step (4) from (148), 
and step (5) from (162). Combining (164) and (165), we obtain H(X) 
>H(Y) + log A, which is (159), i.e., part (a) of the theorem. 

We now inquire about the conditions under which (159) holds with 
equality. Clearly, this will happen if steps (1) in relations (164) and 
(165) hold with equality. From (155), this occurs if there exist measur- 
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able functions g\ , g 2 such that 

Y n -~J = g 1 (W1- 1 ) a.s., (166a) 

and 



W%-m 1+ i = gi(Xn + i) a.s. (166b) 

Hence we shall show that when the conditions of part (6) are satisfied, 
then (166) will hold. 

We begin by considering the following simple situation. Let U = 
{£/„} be an arbitrary real- valued stationary random sequence, such 
that E | U n \" < oo, for some s > 0. Let the random sequence V = { V„} 
be defined by 



V n =Un-tUn-U ~°0 < 71 < 00, (167) 

where £ is a complex number. We can write, for N — 1, 2, • • • , 

N-l 

Un= 2 fVn-k + Z N U n -N, "<*> < 71 < 00. (168) 

A-0 

We now show that when | £ | < 1, £ N U n -N -* 0, as N — > oo, a.s. This will 
follow when we show that for any e > 0, for N = 1, 2, • • • , 

P(\S N U N \>e,i.o.) = 0. (169) 

To establish (169), invoke the Borel-Cantelli lemma and write 



£ P(\t N U N \>e)= X P(\U N \>e\Z\- N ) 

N-l N-l 



^S^^L<oo. 



E\U N \ 

^(Tjlpr 

In (168), we let N — » oo, and conclude that, when | £| < 1, 



U n = I $ k V n - k , a.s., (170a) 

A-0 

and that 

£/^.=/i(V^), a.s., (170b) 

where /i is the function defined by (170a). 

Return now to the random sequences X, Y related by X ■ 
Q 1 (D)Q 2 (D)Y = QAD)W, where W= Q 2 (D)Y. Using (161b), we have 

Af 2 
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Invoking the result of (170) M 2 times, we conclude that Y-l} is a.s. 
calculable from W n sj, i.e., (166a) holds. 
We next investigate when (166b) will hold. Using (161a), we have 

X - Qx(D) W = II U - fijD) W. (171) 

j-i 

This prompts us to consider the process defined by (167) when | £| > 
1. Rewriting (167) as 

so that as in the derivation of (170) we have, for | £ | > 1, 

U n = Z £- k V n+k , -oo < n < oo, (172a) 

A-l 

so that 

U n = f2(VZ +1 ), -00</ l <00. (172b) 

If all the |ft|> 1, 1 </ < Mi, then application of (172) to (171) Mi 
times yields (166b). Thus (159) will hold with equality when all | £,- 1 y* 
1 or equivalently all | a,- 1 9* 1. 

We complete the proof of part (b) by showing that if Y is ergodic 
and E \ Y n \ < », then (166b) holds. It will suffice to show that, with U 
a stationary ergodic sequence with E \ U n \ < °° and V defined by (167) 
with | £ | = 1, we can calculate Un from V£n. From (167) we obtain, for 
-co </i<oo, 1 < A < oo, 

Un=Zr J Vn +J + C k Un+k. 

Thusf >_1 

■t K -i K k i If 

tf»-^2tf»— ?zz r ; v„ +y + - s r"u n+k . 

We will show that, as K -* oo, 

4S r*CUk-m a.s. (173) 

where c is a constant (in fact c = 0, £ ?^ 1 and c = EU n , £ ■ 1), which 
will imply that 

U n - lim -i J J r y K +y + c, 

JC-»on A A_i y_i 

completing the proof of part (b). 



f This step was suggested by A. Gersho. 
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It remains to verify (173). Let 6 be a complex random variable 
uniformly distributed on the unit circle, and independent of the se- 
quence U. Then {Ot'^JL-m is a stationary sequence, and therefore 
[0£~ J U H +j}T—w is also stationary. Thus, as if— > w, 

a random variable with probability 1, and therefore 

^ra^^'ic, a.s. (174) 

Since the left member of (174) depends only on the tail a-field of the 
{Un} , and not on 6, we conclude that c is a constant a.s. In fact, since 
the expectation 

we know the value of c. 

Our final task, part (c) of Theorem 2, is to exhibit a situation in 
which H(X) >H(Y) + log A. Let 

An = Y n ~ In— Is 

and let 



Y„ = tj„ + v 

where {rj„ } "»_„ are i.i.d. standard Gaussian variates, and v is a random 
variable defined as follows. Let 

*vA\ 11* z-0,1,2, 



and let 



■! 



n,>o, 




H,<0, 




n-j> 


o, 


7]_,< 


o, 



Then the binary expansion of v is O.eo€i€ 2 e3- • • • Thus knowledge of v is 
equivalent to knowledge of the sign of tj„, — oo < n< ». Note that the 
sequence Y is stationary but not ergodic. Also log A = 0, so that (159) 
is H{X)>H(Y). Now 

X n — T)n ~ f]n-\, 

and {j] n } is an ergodic sequence. Thus by part (6) H(X) = H(rj) = 
H{f\ n ) = x h log 2-ne, the last equality following by direct integration. We 
now consider H(Y). Observe that 

1 N (l N \ 
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as iV-> oo. Thus v and therefore e = {€,}jL , are calculable fromYn+i . 
It follows that 

H(Y) =H(Y n \ Yn + l) = H(Y n | H+l, €, v) = H^n | H+l, C, ?) 

<^(r,„|6)=ilogy<ilog2^ = H(X). 

This establishes part (c) and completes the proof of Theorem 2. Let us 
remark that it is possible, using the Cantor diagonalization method, to 
imbed a complete specification of the {tj„} in v, thus making H(Y) = 
—oo, without changing H(X). 

Our next task is the application of the theorem to our estimation 
problem. Let X = {X n } be any stationary process such that H(X n ) < 
oo. Let X n = f(X n -~S) be an estimator of X n , and let the figure of merit 
be E | Xn - k n | r , where r > 0. It follows that 

I(X n ; XTJ) S: JUL; *«) = H(X n ) - /T(X. |X.) 

fH^-llQg 2 ^ 1 ^! 1 ^-^, (175) 
r r 

where step (1) follows from the data-processing theorem (which states 
that processing X-~* to form j£„ decreases information), 6,4 and step (2) 
from (146) and the concavity of the logarithm. Since KXniX-Z 1 ) = 
H(X n ) - H(X), (175) yields 

(r-l) rH(X) 

In the special case r = 2, (176) becomes 

*i*-*j , *Vs- (176b) 

Inequahties (176) hold for any stationary process X. When X is given 
in terms of another process Yby (157), we can use part (a) of Theorem 
2 to continue these inequalities, i.e., 

r (r-l) A r rH(Y) 



and, for r = 2, 



E\X n -X n \ 2 >-f-e™\ (177b) 

1 27re 



where A is given in (160a). 
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Inequality (177b) has an interesting interpretation. Assume that the 
spectral density of Y, 

Sy(f) = i py(n)e i2 ' nf , (178) 

n— oo 

where py(n) = EY m Y m+n , -oo < n < oo, exists. Then the spectral 
density of X is 

S x (f) = Sy(f)\Q(e i2 " f )\ 2 . 
Using (160a) and the well-known formula for e^L 1 

2 log A = J log S x (f)df- I log S Y (f)df 
Jo Jo 

•l- r 

■ oo tin I 

Jo 



= loge* 2 u„- log Sy(f)df, 
Jo 



so that 



A 2 -^. (179) 



Here e* 2 \m is, as in Section I, the best linear mean-squared prediction 
error, and S Y is the geometric mean of the spectral density of Y. 
Substituting (179) into (177b), we have 

E\X n - * n | 2 > eZ 2 u n [e 2Hm /(2ireSy)]. (180) 

Now it is not hard to show that when U is a Gaussian process with 
spectral density S Ut H(U) = % log (2ireSu). Thus e 2Hm /2<rre is the 
geometric mean of the spectral density of a Gaussian process with the 
same entropy as Y. This is called the "entropy power" of Y. Thus the 
quantity in brackets in (180) is the ratio of the entropy power of Y to 
Sy, and is unity when Y is Gaussian. Kanter 2 obtained (180) when the 
{Yn} are i.i.d. We have proved it for any stationary Y„ with E\Y n \ 8 
< oo for some s > 0. 
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APPENDIX A 
Let 

X n - Yn - aYn-u EY n = 0, EYl = 1 (181) 

n = 0, ±1, ±2, • • • 
with the Y's i.i.d. variates. For a > 0, let 
ft(xi,x 2 , ••• ,x k \a) 

= E(X n \X n -l = X U Xn-2 = x 2 , ■ • • , X.-* = x k ), (182) 
€jf 2 (0) = E\X n ~ E(Xn\X n -U X n -2, ••', X n -k)f. (183) 

Theorem 3: If the Y's in (181) are symmetric, i.e., if Py(y) = Pv(-y), 
and if for a>0 (182) and (183) hold, then fora<0 

E(X n \X n -l - Xl, ■ • • , Xn-k = Xk) 

= fk(-x u x 2 , -x 3 , ••• , (-l)*a:A;|a|), 
E[X n - E{Xn\X n -u • • • , X„_*)] 2 = cf (| a |). 
Proo/": In (181) set 

X' n = (-l) n+a X n , Y' n = (-D n+a Y n n - 0, ±1, • • • (184) 

for some integer a. In terms of these new variables, (181) becomes 

X' n = Y'n + aY'n-i, EY' n = 0, EY'n 2 = 1. 

Furthermore P'viy) = Pr(y). Thus if a = -b < theX' n have the 
same distribution as the X n do with a = b in (181). Thus if a < 

E(X' n \X' n -l = Xu •-'•,X' n -k = X k )= ft (Xi,X 2 , • • • , **J | O |) 

Now in (184) let a = — n, so thatXJ, = X n , X' n -\ = —X n -i,X' n -2 = X n - 2t 
etc. But then E(X' n \X' n -i = Xi, ••• , X' n - k = x k ) = £(X„ | X n -\ = -x u 
Xn-2 = x 2 , • • • , Xn-k = (-1)***) and the theorem readily follows. 

APPENDIX B 

Let Y and Y x be i.i.d. with EYl < °°. We shall show that 

E var(Yi | Y : - Y ) < ^ var( Yi | Yj + Y ). (185) 

This says that, in estimating Yi in the mean-square sense, the average 
error is less if the difference Yi - Y is known than if the sum is known. 
If Y and Y x are replaced by Y n -\ and Y„ where X n =Y n - aY n -\ as in 
Section I, then (185) states that ef(a = -1) < cf 2 (a = +1), as was 
asserted in Section I. We prove (185) below and the assertion that 
equality holds in (185) only if Y, are symmetric provided that Y has 
a characteristic function <f>(t) = Ee iYot which is nowhere zero. However, 
in general there are nonsymmetric Y, for which equality holds in (185). 
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The following proof of (185) was suggested by C. Mallows: Assume 
without loss of generality that EYi = 0,EYl = 1. Observe first that the 
right side of (185) is x h because 



E(Y X | Y, + Y ) = E(Yo\ Y, + Yo) = E 
and 



Yi + Yo 



Yi+ Yo 



Yi + Yo 



E lYi- 



Yi + Yo 



1 
2' 

Set f(z) = E( Yi + Yo | Y, - Y = z) and note that 

Ef(Y, - Yo) = 0, 

E(Yi - Y )/(Yi - Yo) = E( Y, - Y ){Y, + Y ) = 0. 



(186) 



Now 



E[Y i \Y 1 -Y ] = E 



LzIl + L±Il,Y l -Y t 



^+J/m-r* 



(187) 



Note that 

E var[ Y, | Y, - Y ] = EE[( Y, - £[ Y! | Y, - Y ]) 2 | Y! - Y ] 

= £( Y, - E(Y, I Y, - Yo) 2 = 1 - E(E[ Y, | Yi - Y ]) 2 . (188) 
But using (186) and (187), 
E(E[ Y, | Y! - Y ]) 2 = var(£[ Y, | Yj - Y ]) 

= K Zl ^) 2+ K^ (Yi " Yo) ) 2 ^^ (i89) 

and (185) is proved. 

To prove the assertions about equality, let <j>(t) = Ee ,Yt and note 
that 

<t>(u - v)<t>(u + v)= EE[e ilYo+Yt)u | Yo - Y 1 ]e' (yo - y,)l '. (190) 

Now equality holds in (189) if and only if f(z) = which is the same as 
requiring that the derivative of the left side of (190) at u = vanish for 
all v. That is, 



<j>'(v)<l>(-v) + </>'(- v)<b(v) =0, .-oo < v < oo. 



(191) 



If <j> 5* 0, this says that (log 4>(v))' = (t>'(v)/<t>(v) is odd and so log <j>(v) 
and <f>(v) are even, i.e., Y is symmetric. Note that a characteristic 
function is real if and only if the corresponding random variable is 
symmetric. 
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Our aim now is to show that (191) can hold without <f> being real. We 
shall construct a <jx>(t) that satisfies (191) and is positive definite, real 
far 1 1 1 < 1, and pure imaginary for 1 1 | > 1. We write 

<M*) = 4n(t) + e<t, 2 (t) (192) 

and denote Fourier transforms by upper-case letters so that 

*(*)-[ ^Ax)e ut dx, *M m j;\ hit)*'"' & (193) 

j = 0, 1, 2. 

Now choose fo(t) 4 to be purely imaginary, to be six times 
continuously differentiable and of compact support, and to satisfy 

From (193) one readily finds that 

**(*) -$*<*) (195) 

and that there exists a positive constant Ci such that 

|02W| -(iTW- (196) 

We next choose 

*,(*) - c 2 [H 2 (x - 1) + tf 2 (x + 1)] (197) 

where 



H(X) = i 

477 



sin (x/4) 



l2 



(*/4) 
Here C2 > is a constant chosen to make 



(198) 



- fr(0) = $ 

J— oo 



0o(O) = <M0) = *i(jc) <** = 1. (199) 

J— 00 

Note that &i(x) > for all x and that, for all x, x A <bi(x) > c 3 > 0. Since 
from (195) $ 2 (x) is real and from (196) is 0(l/x 6 ), it is possible to 
choose € sufficiently small so that 

Qo(x) = $i(x) + €0 2 (x) > 

for all x. <}>o is thus the transform of a real positive function and satisfies 
the normalization (199). It is therefore a characteristic function. 
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We next note that 

<M*) = 0, |*| >1. (200) 

This follows immediately from the fact that 



= I H(x)e ixt dx^\ 



*„,-, W u,.- rf ,-^-»l1. IJIJJJ 



has support 1 1 1 < V6 so that the transforms of H 2 (x — 1) and of H 2 (x 
+ 1) both have support 1 t\ < 1. Equation (197) then implies (200). The 
form of (197) also shows that 

&(*) - <t>i(~t) (201) 



and that <f>i(t) is twice differentiable. 

We have now constructed a twice differentiable characteristic func- 
tion <f)o(t) = <f>i(t) + efait) where real even <f>\ vanishes for \t\ > 1 and 
pure imaginary odd <f>2 vanishes for 1 1 | < 1. For all t, then, where <po(t) 
¥> 0, the quantity <t>d(t)/<fa(t) is real and odd. Thus</>6(0/<M0 = (4>o(£)/ 
0o(£))* = ~<t>o(—t)/<f)Q(—t). That is, the characteristic function <£o satis- 
fies (191) and is not real (for t > 1). QED. 

Note added in print. We have recently learned of the work of M. 
Rosenblatt, 7 which overlaps the present paper slightly. 
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